Semandaq: a data quality system based on conditional functional dependencies

نویسندگان

  • Wenfei Fan
  • Floris Geerts
  • Xibei Jia
چکیده

We present SEMANDAQ, a prototype system for improving the quality of relational data. Based on the recently proposed conditional functional dependencies (CFDs), it detects and repairs errors and inconsistencies that emerge as violations of these constraints. We demonstrate the following functionalities supported by SEMANDAQ: (a) an interface for specifying CFDs; (b) a visual tool for automated detection of CFD violations in relational data, leveraging efficient SQL-based techniques; (c) extensive visual data exploration capabilities that provide the user with various measures of the quality of the data; (d) repair (cleaning) functionality without excess human interaction, built upon CFD-based cleaning algorithms; we show how SEMANDAQ allows for a natural exploration of the quality of the obtained repairs. SEMANDAQ is a promising tool that provides easy access and user-friendly data quality facilities for any relational database system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

Received Apr 11, 2017 Revised May 5, 2017 Accepted May 24, 2017 Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for mor...

متن کامل

Discover Dependencies from Data - A Review

Functional and inclusion dependency discovery is important to knowledge discovery, database semantics analysis, database design, and data quality assessment. Motivated by the importance of dependency discovery, this paper reviews the methods for functional dependency, conditional functional dependency, approximate functional dependency and inclusion dependency discovery in relational databases ...

متن کامل

Analyses and Validation of Conditional Dependencies with Built-in Predicates

This paper proposes a natural extension of conditional functional dependencies (cfds [14]) and conditional inclusion dependencies (cinds [8]), denoted by cfds and cinds, respectively, by specifying patterns of data values with 6=, <,≤, > and ≥ predicates. As data quality rules, cfds and cinds are able to capture errors that commonly arise in practice but cannot be detected by cfds and cinds. We...

متن کامل

Discovering Conditional Functional Dependencies to Detect Data Inconsistencies

Poor quality data is a growing and costly problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. In this paper, we present an approach that efficiently and robustly discovers conditional functional dependencies for detecting inconsistencies in data and hence improves data quality. We evaluate our approach empirically...

متن کامل

Mining Constant Conditional Functional Dependencies for Improving Data Quality

This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008